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Abstract 

We introduce a "Statistical Query Sampling" model, in which the goal of an algorithm is to produce 
an element in a hidden set S C {0, 1}" with reasonable probability. The algorithm gains information 
about S through oracle calls (statistical queries), where the algorithm submits a query function g{-) and 
receives an approximation to Pi.j;£s[g{x) = !]• We show how this model is related to NMR quantum 
computing, in which only statistical properties of an ensemble of quantum systems can be measured, 
and in particular to the question of whether one can translate standard quantum algorithms to the NMR 
setting without putting all of their classical post-processing into the quantum system. Using Fourier 
analysis techniques developed in the related context of statistical query learning, we prove a number of 
lower bounds (both information-theoretic and cryptographic) on the ability of algorithms to produces an 
X e 5, even when the set S is fairly simple. These lower bounds point out a difficulty in efficiently 
applying NMR quantum computing to algorithms such as Shor's and Simon's algorithm that involve 
significant classical post-processing. We also explicitly relate the notion of statistical query sampling to 
that of statistical query learning. 

1 Introduction 

Recent years have witnessed the development of a number of exciting quantum algorithms: Simon's algo- 
rithm for the hidden XOR secret problem |28|, Shor's algorithm for factoring and discrete logarithms ll26l 
l27l . Boneh and Lipton's algorithm for the hidden subgroup problem [4|, and many generalizations and 
extensions 11211 [miT2l[T8lfT5lfT7l . At the same time, work has been ongoing on various proposals for physi- 
cally realizing quantum computers. Currently, one of the most promising such proposals is based on Nuclear 
Magnetic Resonance (NMR) [IQ^^IEIISl- The NMR approach works by manipulating a large ensemble 
of quantum systems in solution. One property of the NMR method, which is the focus of this paper, is 
that unlike the "standard" quantum computing model, one cannot directly measure any individual quantum 
system in the ensemble. Instead, a measurement is limited to a single qubit, and when a measurement takes 
place, the device returns (an approximation to) the expected value of this measurement, over the quantum 
systems in the ensemble. For this reason, the model for NMR is sometimes called the "expected-value" 
(EV) model 0. In contrast, the measurement in the standard quantum model yields a random sample state 
(which may consists of multiple bits) according to a classical probability distribution. 

Given the distinction between the standard model and the EV model, the first question that arises is 
whether it is possible to translate algorithms working in the standard model to work in the EV model. In 
fact, the answer is yes. Consider any BQP algorithm |24|. Recall from the definition that a BQP algorithm 
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solves a decision problem, and such an algorithm has a special "target" qubit to indicate acceptance. For 
a language L and an input x, if x G L, then the measurement of the target qubit will produce a "1" with 
probability at least 3/4; if x L, the probability is at most 1/4 when measured. Such an algorithm 
works naturally in the EV model, since one can simply measure the target qubit, and even with significant 
measurement error, use the rule that if the observed value v > 1/2, then x G L, and otherwise x ^ L. 
For a search (as opposed to decision) problem, we can perform the usual reduction to a series of decision 
problems, solving each one by one. In fact, many researchers have used this approach IEIEHj which we 
call an "all-inclusive" translation. 

Unfortunately, the "all-inclusive" translation can greatly increase the amount of work that must be done 
by the quantum system. Consider Shor's algorithm, for instance (see Appendix IaI). Shor's algorithm (and 
others like it) consists of a quantum sampling circuit Q, whose output is measured and fed into a classical 
extraction circuit C. For the all-inclusive translation, the classical extraction circuit C needs to be "quan- 
tumized", i.e., realized by a quantum circuit and appended to the quantum sampling circuit Q. This can 
cause a significant increase in the size of the quantum circuit — in the case of Shor's algorithm, the entire 
circuitry for computing continued fractions needs to be realized in quantum — which is a rather undesirable 
consequence. Even in the most optimistic scenarios, quantum computers will be orders of magnitude more 
difficult to manufacture and maintain than classical computers, and thus we would like to put as little of the 
complexity as possible in the quantum system. Even more serious problems emerge when more than one 
sample is needed by the classical extraction circuit. For example, in Simon's algorithm, Q,{n) samples are 
needed for Gaussian elimination (see AppendixIXJ. Now the all-inclusive translation needs to manufacture 
multiple copies of the quantum sampling circuit and then connect them together with the "quantumized" 
classical extraction circuit. This can cause even more blowup in the size of the quantum circuit in the EV 
model. 

In this paper, we consider the question of whether there might be more efficient translations that apply 
generally to algorithms consisting of a quantum sampling circuit Q followed by a classical extraction circuit 
C, that work without having to put the classical part of the algorithm into the quantum system. Our main 
contributions are results that answer this question in the negative, for several natural notions of "general". 
We achieve these results through a connection to the notion of statistical query learning [22.1 studied in 
Computational Learning Theory, and in particular to a related notion that we introduce of statistical query 
sampling. Using techniques from Fourier analysis and cryptography, we show that even in cases where the 
distribution implied by Q is quite simple, it can be hard to use the EV model to generate a sample that 
can be used by C. Note that our results do not preclude the possibility of approaches tailored to specific 
quantum algorithms. For example, Collins (6) demonstrates a modification to Grover's algorithm that is 
more efficient than the all-inclusive translation (see also the discussion below). However, as pointed out by 
the author, his approach does not generalize to algorithms like Shor's. 

1.1 Our model and results 

We view the quantum sampling circuit Q as representing a hidden set S C {0, 1}", and we view the classical 
post-processing as a circuit C such that C(x) = 1 for all x G S". The goal of the translation procedure is 
to produce some x G S*. To find such an x, the algorithm has the ability to perform a "statistical query" of 
Q by proposing a query function (a predicate) g : {0, 1}" ^ {0, 1} and asking for Ej;g5'[(7(x)] up to some 
l/poly accuracy. For example, measuring the ith qubit corresponds to the query g{x) = Xj. Taking the 
XOR of the first three qubits and then measuring the result corresponds to the query g{x) = xi © X2 ® X3. 
The algorithm may repeat this process multiple (polynomially-many) times, with different query functions 
g, and in the end must (with noticeable probability) produce an x G 5. 
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Note that this task is easy to do if S is very large (IS*! > 2"- /poly (n)), since a random x G {0, 1}" 
will do. It is also easy to do if S is very small (IS*! = poly{n)). In particular, if \S\ = poly{n), then by 
asking for an accuracy of 1/(2|5|) one can distinguish the case that 'Exesidix)] = from the case that 
Eix(zs[g{x)] > 0. This allows one to walk down the bits of x, fixing bits from left to right, until a specific 
X G is produced. This is the key idea of (6|. 

We show, however, that this task is hard in general. Specifically, we give two types of hardness results. 
First, we give an information-theoretic hardness result if the query algorithm is not allowed to access C. 
That is, the translator is allowed to use the fact that the classical extraction circuit C is polynomial in size 
(so the set of accepting strings cannot be totally arbitrary) but it is not allowed to examine C — it can 
only gain information via the queries g. Second, we give a cryptographic hardness result if we assume the 
translator is given C as input, but that otherwise C is an arbitrary polynomial-size circuit. We still do not 
know if efficient translation is possible for the specific circuit C used in Shor's algorithm. 

We also consider a more general setting, in which S may be large (e.g., \S\ = 2"^^), so a random 
string has reasonable chance of belonging to S, but the goal of the translation is to produce a string x G 
S with probability substantially greater than random guessing. We call this more general setting "strong 
SQ-sampling", and refer to the former setting as the "weak SQ-sampling". Strong SQ-sampling models 
situations such as Simon's algorithm, in which the quantum circuit produces a random y G {0, 1}*^ such 
that y • s = for the hidden secret s. In this case, a random string has probability 1/2 of belonging 
to S, but we need r2(n) correct samples in a row in order to perform Gaussian elimination. We give an 
information-theoretic hardness result for this problem, that holds for the specific set S used by Simon's 
algorithm (Theorem |2ji.' 

1.2 Techniques and relation to Statistical Query learning 

Our results are based on a connection to the Statistical Query (SQ) learning model, first introduced by 
Kearns 1 22 1 as a restricted version of the popular Probably Approximately Correct (PAC) model of Valiant P BOI 
In these learning models, the goal of an algorithm is to learn an approximation to a hidden function 
/ : {0, 1}" ^ {0, 1}. In the PAC model, the algorithm has access to an "example oracle", which pro- 
duces a random labeled sample {x, f{x)) upon invocation. In the SQ model, however, the algorithm does 
not see explicit examples or their labels. Instead, the algorithm queries an "SQ-oracle" with predicates 
g{x, y), and receives an approximation to Pr^[<^(x, f{x)) = 1]. For instance, the algorithm might ask for 
the probability that a random example would both be positive and have its first bit set to \{g{x,y) = xif\y)? 
The SQ model has proven to be very useful because (a) it is inherently tolerant to classification noise (this is 
the reason the model was developed), and (b) nearly all machine learning algorithms can be phrased as SQ 
algorithms. What makes the SQ model especially interesting is that one can information-theoretically prove 
lower bounds on the ability of SQ algorithms to learn certain classes of functions I22l l3l l20ll3ni32ll . 

The relationship between the standard model and the EV model for quantum computation is quite similar 
to that between the PAC model and the SQ model in machine learning, which motivates our definition of 
the Statistical Query Sampling problem. In particular, the SQ sampling problem can be viewed as the 
SQ learning problem with two key differences: first, the goal is not to learn an approximation to / but is 

'Note, for Simon's algorithm, we no longer want to think of there existing a known classical extraction circuit. If we were given 
access to a circuit C such that C[x) = 1 iff a; G S* (e.g., the circuit with the hidden secret built in) then the sampling goal would be 
easy. See Theorem|4|for further discussion. 

^In both PAC and SQ learning models, the distribution over x need not be the uniform distribution (or even known to the 
learning algorithm). However, much work on SQ learning does focus on the uniform distribution, and that is the setting we are 
most interested in in this paper. 
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rather to produce a positive example, and second, the oracle for SQ sampling returns approximations to 
Fi[g{x) = 1 I f{x) = 1] rather than to Fr[g{x, f{x)) = 1] (a difference that matters when the set of 
positive examples is quite small). 

We use techniques from Fourier analysis to prove the following lower bounds. First (Theorem Q we 
show there exist simple function classes such that no algorithm, using only a polynomial number of queries 
of l/poly accuracy, can produce a positive instance with even 1/poly probability. Second (Theorem|2ll, for 
the class of "negative parity" functions arising in Simon's algorithm, no algorithm using only a polynomial 
number of queries of 1/poly accuracy, can produce a nontrivial positive instance with probability more than 
1/2 + l/poly. (Note that random guessing works with probability 1/2). We also show that unlike the case 
of SQ learning, the SQ sampling problem can be computationally hard even if / is explicitly given to the 
algorithm, based on cryptographic assumptions (see TheoremlSj. 

Finally, we explicitly relate the notion of SQ sampling to that of SQ learning by proving that if a function 
class is "dense", meaning that a random element has non-negligible probability of being positive, then strong 
SQ-learnability implies strong SQ-samplability (Theorem |4]i. We also point out that there exists function 
classes that are perfectly SQ-samplable, yet not even weakly SQ-learnable. 

2 Preliminaries and Definitions 

We are interested in predicates that map elements from a domain X (e.g., {0, 1}") to {0, 1}. For a predicate 
f : X {0, 1}, an input x is a positive input to / if f{x) = 1, else it is a negative input. All the positive 
inputs to / form the positive set of /, denoted by Sf. A predicate class, often denoted by C„, is simply a 
collection of predicates over {0, 1}". A predicate class family is an infinite sequence of predicate classes 
C = (Ci, C2, ...), such that C„ is a predicate class over {0, 1}". 

A parity function ©s(x) is defined to be ®s{x) = s ■ x mod 2. A negative parity function (x) is 
the negation of the parity function ©s(x). 

2.1 Statistical Query Sampling 

Definition 1 (Statistical Query Sampling Oracle) A statistical query sampling oracle (SQS-oracle) for a 
predicate f is denoted by SQS-^. On input (g, where g : {0, 1}" 1— > { — 1, +1} is the query function and 
^ G [0, 1] is the tolerance, the oracle returns a real number y such that \y — E^jg^^ [5(2^)]! ^ 6 

Definition 2 (SQ-Samplability) A predicate class family C is SQ-samplable at rate s in time t and tolerance 
^, if there exists a randomized oracle machine Z, such that for every n > and every f £ Cn, Z with access 
to any SQS-oracle SQS^, runs in at most t{n) steps, asks queries with tolerance at least ^, and outputs an 
X € Sf with probability at least s(n). We say C is strong SQ-samplable if for every e, C is SQ-samplable 
at rate 1 — e in time t and tolerance ^ such that t and are polynomial in n and 1/e. We say C is weak 
SQ-samplable if there exists a polynomial p, such that C is SQ-samplable at rate 1 /p{n) in time and inverse 
tolerance polynomial in n. 

Definition 3 (Sampling Algorithms with Auxiliary Inputs) A predicate class family C is SQ-samplable 
with auxiliary input (p if it is SQ-samplable by an algorithm Z which takes (/>(/) as the auxiliary input, 
where f is the predicate being sampled. 
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3 Lower Bounds Based on Fourier Analysis 



We first prove two hardness results on SQ sampling, using Fourier analysis techniques developed in the 
context of SQ learning. 

3.1 A Lower Bound on Weak SQ-Sampling 

We prove that there exist very simple families of predicate classes that are not weak SQ-samplable, i.e., no 
efficient algorithm can produce a positive input at any non-negligible rate. 

We introduce a bit more notation. We use boldface to denote a vector and index the entries of an n- 
dimensional vector from to (n — 1). We use x[i] to denote the i-th entry of x. x[a..6] indicates the 
sub-vector formed by the entries of x between the a-th and the 6-th, inclusive. Let X„ p be the set of all 
n-dimensional vectors over (the Galois field modulo p) whose last n — 1 entries are not all-zero, i.e., 

Xn,p = {x e I x[l..n - 1] / (0, 0, 0)}. (1) 
It is easy to see that |X„^p| = p^ — p. 

Definition 4 (Booleanized Linear Functions) A booleanized linear function over Xn,p with parameter a 
is denoted by La ond defined as 

ia(x) = |i (2) 

^ ' \ otherwise ^ ' 

We say is normalized if a[0] = 1. The normalized booleanized linear function class, denoted by Cn,p, 
consists of all normalized booleanized linear functions over Xn,p- In other words, 

Cn,p = {La I a G r;, a[0] = 1} (3) 

Theorem \ If a sampling algorithm for the normalized booleanized linear function class Cn,p makes less 
than p"/^ queries, each of tolerance Xjp^l'^, then the probability it produces a positive input x G Xn^p is at 
most l/p+ l/p"/^^. 

Notice that the requirement x G X„ p is simply to rule out the trivial positive input 100 ... 0, and we 
could have equivalently just modified the definition of a "booleanized linear function" so that this specific 
example is made negative. Also, notice that if we choose p to be much greater than n, say picking p to 
be an n-bit prime number, then 1/p + l/p^/^'^ is exponentially small, while the size of the problem is still 
polynomial in n. Furthermore, if a completely random x is picked, the probability it is a positive input is 
1/p. Thus even exponentially many queries may only help the sampling by an exponentially small margin. 

Proof: Our proof strategy is similar to that used by Kearns 1*221 and Blum et. al. (3^ in the context of SQ 
learning. We describe an "adversarial" SQS-oracle SQS that does not commit to any particular predicate 
at the beginning. Rather, the oracle maintains a "candidate predicate set" P, which initially includes all 
predicates in the class £„ p (a total of them). Each time the algorithm Z makes a query, SQS replies 
with an answer that yields very little information. Some predicates in the candidate set P might not be 
consistent with the answer and will be removed from set P. After all the queries are finished, SQS then 
commits to a random predicate remaining in P. We shall prove that each query only removes a small 
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fraction of the predicates from P. Thus if Z does not make enough number of queries, there would be 
enough predicates left in P such that no element can be positive with high probability. 

For a query function g : Xn,p "-^ +!}> we say that a subset S C {0, 1}" is a ^-independent subset 
for g, if |E2^^g5'[5r(j;)] — E^^ [di^)] I ^ ^» ^rid we say a predicate / is ^-independent from g, if its positive 
set Sf is a. ^-independent set for g. Intuitively, if a predicate / is ^-independent from g, then the query 
{g,C) reveals almost no information about /, since SQS can reply with E^^^ [di^]] instead, which is 
completely independent from /. 

We describe the behavior of our SQS-oracle SQS in more detail. On query g, SQS replies with E^^x idi^)] > 
and removes all predicates that are not (^-independent from g from the candidate set P. We assume that all 
queries have tolerance ^ = We shall prove that for any query g, there are at most p2n/3+2 predicates 

not p^"/^-independent from g. This proof is by a Fourier analysis technique and is given as Lemma|5]in Ap- 
pendixlDl Thus, if less thanp"/^ queries are made, the candidate set still contains at least 
parity functions. 

Now consider the domain Xn^p. It is not hard to see that every x G Xn,p is positive for only p^~'^ 
predicates. So, if the oracle commits to a random predicate out of the set of — p-"/i2-3-)^ jj^g 

probability that x is positive is at most 1/p + 1 /p"/^^. ■ 



3.2 A Lower Bound on Sampling Negative Parity Predicates 

We prove that a class of negative parity functions is not SQ-samplable in polynomial time at any rate non- 
negligibly higher than 1/2. 

Theorem 2 Let Xn = {0, 1}"\{0"} and Cn be the class of negative parity functions over X„. If a sampling 
algorithm for Cn makes less than 2"/^ queries, each of tolerance 2~"/^, then the probability it produces a 
positive input is at most ^ + ^„^4~2 ■ 

Before proving the theorem, we point out how this result relates to the translation of Simon's algorithm 
to the NMR model. In Simon's algorithm, the quantum sampling circuit produces a random y £ {0, 1}" 
such that y ■ s = 0, where s is the "hidden" secret (see Appendix [AJ. Thus the hidden set corresponds 
exactly to the negative parity function -i©s. In the algorithm, the quantum sampling circuit is invoked G(n) 
times and produces ©(n) samples for Gaussian elimination. Notice that y = 0" is useless. Therefore, a 
translation of the quantum sampling circuit will produce an SQ-sampling algorithm Z to be executed 0(n) 
times and to produce 0(n) positive samples in X„ = {0, 1}"\{0"}. However, Theorem |2l implies that it is 
not possible to sample efficiently at any rate non-negligibly higher than 1/2 (notice that a random x G X„ 
is positive with probability almost 1/2). This result suggests that it appears necessary to manufacture 0(n) 
copies of the quantum sampling circuit and run these copies together in the NMR model. 

Proof sketch: The proof strategy is similar to that of Theorem^ We assume that each query has tolerance 
^ = 1/2"/^. We construct an SQS-oracle that on query function g, replies with 'E^^^Qi^n[g{x)], and 
remove all predicates that are not ^-independent from g from the candidate set P (here the definition of 
"^-independent" naturally changes to |Ea:gs'[(7(x)] — E^gjo^ijn [g{x)] \ < Q. We shall prove in LemmaQ(in 
AppendixlDji that for any query g, there are at most 2"/^+^ predicates not 2^"/'*-independent from g. Thus, 
if less than 2"/^ queries ai^e made, the candidate set still contains at least 2" — 2'^"/^+^ — 1 parity functions. 

Now consider the domain X„ = {0, 1}"\{0"}. It is not hard to see that every x G X„ is positive for 
2"^^ negative parity functions. Now if a random parity function is chosen from a set of size 2" — 2^"/^"'"^ — 1, 
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the probability that x is positive is at most 
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This is true for any x G Xn- Therefore, whatever Z outputs, the probabiUty that it is positive is at most 



4 A Cryptographic Lower Bound 

We next prove a cryptographic lower bound. Assuming that one-way functions exist, we show that there 
exist predicate class families that are not weak SQ-samplable, even if the sampling algorithm is given the 
complete description of the predicate as the auxiliary input. The technique we use here is somewhat similar 
to that of Angluin and Kharitonov 1 1 1, who used signature schemes to prove that membership queries do not 
help to learn DNF. 

We briefly describe the ideas behind our proof. We will use a digital signature scheme secure against 
adaptive chosen message attack (T4\ . which exists if one-way functions exist |25|. Let the predicate be 
the signature verification function ver„/c(m, s), which returns 1 if s is a valid signature to message m with 
respect to the verification key vk. The security of the signature scheme states that no "breaker" B, given 
access to a signing oracle, can produce a new vaUd signature it has not yet seen. We want to argue that this 
implies no sampling algorithm Z, given access to a SQ-sampling oracle, can produce any valid signature. 
We will show that if such an algorithm Z exists, we can construct a "breaker" B as follows. The breaker 
will have access to a signing oracle OSign that signs any message given to it as input, and runs Z as a 
subroutine. The only non-trivial part for B is to simulate an SQS-oracle used by Z without revealing to Z 
any information about which signatures it has already seen (so that Z is not biased towards producing an 
already-seen signature). Upon a query {g, ^) from Z, B will produce a number of random messages, ask 
the signing oracle to sign them, and use these samples to estimate E^g^^ [(^(x)]. Next, B "randomizes" this 
estimate by adding an artificial noise to it. With properly chosen parameters, this "randomized" estimate is 
still a vahd answer with very high probability, and yet almost independent from the messages B produces. 
Finally, Z produces a positive input, which is a message/signature pair (m', s'). The distribution of the this 
pair {m' , s') is also almost independent from the messages B produces, and if Z only makes polynomially 
many queries, then only polynomially many messages will be produced by B. Therefore the probability that 
m' is one of the messages produced by B is very small, and so B breaks the digital signature scheme with 
reasonably high probability. 

Formally, a signature scheme SIG is a triple (sig_gen, sig_sign, sig_verify) of algorithms, the first two 
being probabilistic, and all running in polynomial time. sig_gen takes as input 1" and outputs a sign- 
ing/verification key pair {sk,vk). sig_sign takes a message m and a signing key sk as input and outputs 
a signature s for m. WLOG we assume that both m and s are n-bits long. sig_verify takes a message m, 
a verification key vk, and a candidate signature s' for m as input and returns the bit 5 = 1 if s' is a valid 
signature for m for the corresponding verification key vk, and otherwise returns the bit 6 = 0. Naturally, if 
s = sig_sign(sA;, m), then sig_verify(t>/c, m, s) = 1. In an adaptive chosen message attack fl4l . an adver- 
sary ("breaker") B is given vk, where {sk, vk) ^ sig_gen(l"), and tries to forge signatures with respect to 
vk. The breaker B is allowed to query a signing oracle OSign^,;,, which signs any message with respect to 
vk, on messages of its choice. It succeeds in existential forgery if after this it can output a pair (m, s), where 
sig_verify(wA;, m, s) = 1, but m was not one of the messages signed by the signature oracle. A signature 
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scheme SIG is existentially unforgeable against adaptive chosen message attacks if there is no forging algo- 
rithm B that runs in time polynomial in n and succeeds with probability l/poly{n). Such schemes exist if 
one-way functions exist 125 J . 

Theorem 3 Let SIG = (sig_gen, sig_sign, sig_verify) be a digital signature scheme secure against adap- 
tive chosen message attack. Then the predicate class family Cn = {ver^^} is not weakly SQ-samplable, 
even if the sampling algorithm is given vk as the auxiliary input. Here ver^^ i^ defined to be ver„fc(m, s) = 
sig_verify(t;A;, m, s), where {sk, vk) ^ sig_gen(l"), and m,se {0, 1}". 

Proof: Assume to the contrary that there exists an algorithm Z that weak SQ-samples the function class 
Cn = {ver^ifc}. More precisely, we assume that Z produces a positive input with probability e by making 
q queries, where both 1/e and q are bounded by a polynomial in n. We shall construct a polynomial-time 
algorithm B that breaks the signature scheme SIG with probability e/2, causing a contradiction. 

We now describe the behavior of B. B has access to a signing oracle OSign^;^, and interacts with the 
sampling algorithm Z as the SQS-oracle. When Z makes a query (g,^), B does the following. First, B 
computes = "fe and M = iMl^£/£i_ xhen B draws M random messages mi,m2, ...,mM £ {0, 1}", 
and asks the signing oracle to sign all of them. Assume the signatures are si, S2, sm- Next, B uses these 
message/signature pairs to estimate the expected value of g by computing x = jj Ylk=i ^("^fci Then B 
"randomizes" x by drawing a y uniformly randomly from the interval [x — |, x + |], and sending y to Z as 
the answer to the query {g, B also maintains a "history set" set H of all the messages it has generated, 
which is initially 0. After a query from Z is answered, B adds the messages mi, m2, tum to set H. 

After all the q queries are made, Z produces a pair {m', s'). If vert,fc(m', s') = 1 and m' H, then B 
outputs (m', s') and successfully forges a signature. Otherwise B aborts and announces failure. 

It is clear that B runs in polynomial time. Intuitively, we can show that after the randomization, with 
high probability the sample {m' , s') produced by Z is almost independent from the history set H. Therefore, 
with high probability, m' H, and so B will succeed. More precisely, we prove that B will succeed with 
probabilit at least e/2. 

We use Syk to denote the positive set for predicate ver^j^. In other words, S^k consists of valid mes- 
sage/signature pairs with respect to the verification key vk. 

Claim 1 For a query function g, if we define a = E(„ ^^g^^^. [g(m, s)], then with probability at least 1 — 
e/5q, we have \x — a\ < (oil quantities are as defined in the proof sketch of Theorem^. 

Proof: This is due to a straightforward application of the Hoeffding Bound. Each sample (m^, Sk) is an 
independent random element from S^k and thus E(^^)g5^^ s) = 1] = a. So the expected value of x 

is a. Now, the probability that M independent samples yields an average below a — ^q\s at most e^^^^o/^ 
(notice that the range of g is {—1, +1})- Also the probability that the average is above o" + is at most 
e-*^5o/2. Therefore with probabihty at least 1 — 2e~^^^o/2 > 1 — e/bq, we have \x — a\ < ^o- ■ 

We fix a set consisting of M message/signature pairs generated by B in response to a query {g, ^), and 
denote this by [/: U = {{nik, sa:)}^^!- We call this set a sample set. We say U is typical, if the average 
g{mk, Sk) is indeed ^o-close to a. By Claim^ at most e/bq fraction of the sample sets are not typical. 

Notice that a typical sample set will yield an average that is ^Q-close to a. This is a much higher 
accuracy than required by the Z, which has a tolerance of ^. However, B needs this accuracy to perform the 
randomization. 

Claim 2 If U is a typical set, then the answer from Bfor this query is valid. 
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Proof: Notice that if U is typical, then the average x is ^o-close to the true value a. After the randomization, 
it is (^0 + C/2)-close to a. This is less than ^. ■ 



We consider the distribution of the answer produced by B for a particular query {g,i). We denote this 
distribution by Djj, where U is the sample set used by B. 

Claim 3 If both Uq and Ui are typical sets, then the statistical distance between Djjg and Djj^ is at most 
e/5q. 

Proof: We use xq and xi to denote the averages obtained from Uq and Ui, respectively. If both Uq and Ui 
are typical, we have \xq — a\ < £,0 and \xi — a\ < ^o- Thus we have |xo — xi\ < 2,^o- Notice that Du^ is 
a uniform distribution over the interval of length ^ centered at xq, and Du^ a uniform distribution of same 
length centered at xi. The claim follows from Lemma|3] ■ 

Notice the history set H consists of q sample sets. We say a history set H is typical, if all its sample sets 
are typical. Then at most e/5 fraction of the history sets are not typical. We denote the distribution of all 
answers produced by B using history set H by Th- 

Claim 4 If both Hq and Hi are typical, then the statistical distance between Thq and Thi is at most e/5. 
Proof: This directly follow the sub-additivity of statistical distance (see Appendix O. ■ 

Now we fix an arbitrary typical set H and denote its corresponding distribution of the answers by T. 
Then we know the distribution from any typical set is at most e/5 away from T. 

The only information Z receives from B is represented by the distribution of the answers produced by B, 
which is in turn determined by the history set B uses. Thus, the distribution of the pair [m' , s') is completely 
determined by the history set H, and we denote this distribution by Oh- We know that if H is typical, then 
Pr(m,s)eOj^ [ver„fc(m, s) = 1] > e. We fix the distribution O that corresponds to the history set H. Then we 
have 

Pr _[ver„fc(m,s) = 1] > e. (4) 

(m,s)eO 

Furthermore, we know that for any typical history set H, its corresponding distribution of Oh is e/ 5-close 
to O. 

Consider a new experiment (a new execution of the breaker B) that is identical to the original one, except 
when Z outputs a pair (m', s'), it does so according to the fixed distribution O. 

Claim 5 Let M be the maximum size of the sample sets in H. Then the probability of the new experiment is 
at least e- M ■ q/T^. 

Proof: Notice that the output of Z is independent from the history set H. Moreover, the history set contains 
at most M ■ q messages. So the probability that a particular m is in i7 is at most M • q/2^. This fact, along 
with proves the claim. ■ 

Now putting things together, with probability at most e/5, the history set H is not typical; if H is typical, 
the difference between the probabilities of the two experiments is at most e/5; the probability of success of 
the new experiment is at least e — M ■ q/2^ . Therefore the probability of success of the original experiment 
is at least (for n large enough) e — M • g/2" — e/5 — e/5 > e/2. 

This finishes the proof. ■ 
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5 SQ sampling and SQ learning 



We now point out relationships between our SQ sampling model and the SQ learning model of Keams l22l . 
We begin with definitions of SQ learning. (In these definitions, we assume learning is with respect to the 
uniform distribution over examples.) 

Definition 5 (Statistical Query Learning Oracle) A statistical query learning oracle (SQL-oracle) for a 
predicate f is denoted by SQL-^. On an input {g, where g : {0, 1}" x {0, 1} ^ {—1, +1} is the query 
function and € [0, 1] is the tolerance, the oracle returns a real number y such that jy— E^gjo i}n [g{x, f{x))] \ < 

e 

Definition 6 (Strong SQ-Learnability) A predicate class family C is Strong SQ-learnable if there exists a 
randomized oracle machine Z, such that for every n > 0, every f G Cn and for every e > 0, 6 > 0, Z with 
access to any SQL-oracle SQL-^ outputs a hypothesis f such that Prxe{o,i}" [/(^) = /(^)] ^ 1 ~ £ with 
probability at least 1 — 5, and furthermore, both the running time of Z and the inverse of the tolerance of 
each query made by it are bounded by a polynomial in n, 1 /e and 1 /5. Here e is called the accuracy and 5 
the confidence. 

Definition 7 (Weak SQ-Learnability) A predicate class family C is weak SQ-learnable if there exists a 
randomized oracle machines Z and a polynomial p(-), such that for every n and for every f S C„, Z with 
access to any SQL-oracle SQL^ , outputs a hypothesis f such that Fi^;^^Qiyn[f{x) = f{x)] > l/2 + l/p(n), 
and furthermore, both the running time of Z and the inverse of the tolerance of each query made by Z are 
bounded by a polynomial in n. 

The first observation to make is that a predicate class can be strongly SQ-learnable and yet not even 
weakly SQ-samplable. In particular, any class with a sufficiently low density of positive examples can be 
trivially learned by producing the "all zero" hypothesis. (Formally, if we wish be correct even for values 
of e that are exponentially small, it suffices to have the density less than 1/2"/^ so that if necessary we can 
use the SQL oracle to identify all positive examples.) In the other direction, a class can be strongly SQ- 
samplable and yet not even weakly SQ-leainable. Indeed, the family of negative parity functions taken over 
the domain {0, 1}" is trivially SQ-samplable (because /(O") = 1 for any such /), but such functions are 
not even weakly SQ-learnable (2T\. It is interesting to compare this to Theorem |2l since the predicate class 
families in these two theorems are very similar (one can think of the difference either as removing 0" from 
the domain, or simply as changing the values of the functions at this one point), yet they have completely 
different characterization in terms of SQ-samplability. 

However, we show there is a relationship between these notions when the set of positive examples is 
sufficiently dense. 

5.1 SQ-learnability sometimes implies SQ-samplability 

We prove that under certain circumstances, SQ-learnability implies SQ-samplability. 

Definition 8 (Density of Predicates) The density of a predicate f : {0, 1}" i-^ {0, 1}, denoted by p{f), is 
the fraction of its inputs that are positive. In other words, p{f) = Pr^^gjo,!}" [f{x) = 1]. 

Definition 9 (Dense Predicates) A predicate class family C is dense if there exists a polynomial p(-) such 
that for every n and for every f £ Cn, p{f) > 1 /p{n). 
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Theorem 4 If a dense predicate class family is strong SQ-learnable, then it is also strong SQ-samplable 
with the auxiliary input p. 

Proof: Let Z be the algorithm that strongly SQ-leams dense predicate family C. We construct a new 
algorithm A that strong SQ-samples C using the density p of the predicate / as auxiliary input. A runs a 
copy of Z, whose accuracy and confidence are set to be e = p • e'/41n(4) and 6 = e' /A, and simulates the 
SQL-oracle used by Z. We shall prove that A produces a positive input with probability at least 1 — e' . 

We now describe the behavior of A. A works in two phases. In this first phase, it simulates the SQL- 
oracle SQL-^. When Z submits a query ((7, ^) to A, A does the following. 

1. Set M = ^^^^p^, draw M independent samples xi, 3:2, xa/ from {0, 1}", and compute 

1 ^' 

i=l 

2. Construct two query functions gQ{x) = 5(2;, 0) and gi{x) = g{x, 1). Submit queries {go,£,/3) and 
(gi, ^/3) to the SQS-oracle SQS^ and receive yo and yi as answers. 

3. Compute y = s + {yi — yo) ■ p and send y to Z as the answer to the query {g, ^). 

The algorithm A enters the second phase when Z produces a hypothesis /. Then A repeats the following 
procedure. It draws a random x G {0, 1}", and check if f{x) = 1. If so it stops and output x; otherwise 
it continues. The procedure is repeated In (|) /p times and if A still hasn't stopped, it produces a random 
X € {0, 1}" and outputs it. 

It is clear that A runs in polynomial time. Now, we prove that A produces a positive sample with 
probability at least 1 — e'. 

First, we prove that with probability at least 1 — S, all answers provided by A are valid in the first 
phase. Consider an average s as an approximation of E^gjo^jn [(^(x, 0)]. We say s is "bad", if |s — 
E^.g|o i}n [^(x, 0)]| > ^/3. Then a simple application of the Hoeffding Bound (see Appendix IbI proves 
that the probability that s is bad is at most 6/q. 

Next, notice that 

g{x, fix)) = g{x, 0) + [g{x, 1) - g{x, 0)] • /(x). 

Therefore we have 

^xe{o,iy49{x,f{x))] = E^g|o,i}n[5r(x,0)] + 

ExG{o,i}" [(5(2;, 1) - gix, 0)) • fix)] 
= ^xe{o,i}49{x,0)] + 

(E,65,[5(x,l)] -E,e5,[5(a;,0)]) 

Therefore, if s is not bad, then the y computed by yl is a valid reply to query [g, ^). Since Z makes a 
total of q queries, with probability at least 1 — 6, all the replies by A are valid and Z should perform well. 

Next, consider the second phase of A. With probability at least 1 — 6, Z should produce a hypothesis / 
that agrees with / with probability at least 1 — e. Let us assume th Z does produce such a /. Now since a p 
fraction of the inputs are positive, the probability that A doesn't draw a positive input in In ( /p rounds is 
at most 6. The probability that / makes a mistake in any of the rounds is at most In (|) • e/p. If / doesn't 
make any mistakes and at least one positive input is drawn, then A will correctly output it. 

Putting everything together, we know that with probability at least 1 — 3(5 — In - e/p = 1 — e', A will 
output a positive input. ■ 
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We remark that it appears necessary for the SQ-sampUng algorithm to have the density p as an auxiliary 
input. One difference between SQ-sampling and the SQ-leaming is the resolution. In the reply of an SQS- 
oracle, the underline distribution is uniform over the "hidden set" Sf; for an SQL-oracle, the distirbution is 
uniform over the entire set {0, 1}". Therefore, a sampling algorithm needs to know the size of Sf in order 
to perform the simulation (more precisely, in step 3 of the first phase). 

It is interesting to compare this result to Theorem|3l which shows a predicate class family that is perfectly 
SQ-leamable, but not even weakly SQ-samplable. Nevertheless, there is no contradiction since the predicate 
class family in Theorem|3lis not dense. 
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A Shor's Algorithm and Simon's Algorithm 

We briefly summarize Shor's algorithm for factoring and Simon's algorithm for the hidden XOR-secret 
problem. 
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A.l Shor's Algorithm for Factoring 

Standard number theory reduces factoring N to finding tlie order of a random element a modulo N, i.e., 
r > such that a** = 1 (mod N) but a* ^ 1 (mod N) for any < s < r. Suppose 2"-^ < < 2". 
Shor's algorithm uses 2n qubits, separated into two n-qubit registers. Initially the state is initialized to 
I (po) = I 0")| 0"). By applying the Fourier transformation followed by modular exponentiation, this state 
is converted to | 0i) = Ylx I ^)l '^^ "^^^ Then one measures the second register and discard it, 
leading to a state | = ^ J t • r + c) for some random c € [r], where t ranges from to [(2" — 1 — c)/rj 
(we ignore the scalar factor). Finally, one applies the inverse Fourier transform to the first register followed 
by a measurement. The distribution of the measurement result is approximately uniform over {[t • 2"/r] : 

< t < [(2" — 1 — c)/rj }. One can then solve r from one instance of [t ■ 2"/r] using continued fraction. 

A.2 Simon's Problem and Algorithm 

A function / : {0, 1}" ^ {0, 1}" is given as an oracle, with the promise that there exists an s G {0, 1}" 
(known as the "hidden secret") such that f{x) = f{y) iff x © y = s. Notice that if s = 0", then / is a 
permutation, and otherwise / is a 2-to-l function. The problem is to tell if s = 0". 

Simon's algorithm works as follows. One starts with 2n qubits, sepai^ated into two n-qubit regis- 
ters. Originally one initializes the state to | = | 0")| 0"). Next, one applies the Hadamard operator 
to the first register and then the oracle operator \ x)\y) i— > | x)| f{x) © y). The state becomes | (f)i) = 
^^^72 I x)| /(x)). Next, the second register is measured and discarded. If s = 0", then the measurement 
result is | ^2) = I x) for arandom x S {0, 1}". If s 7^ 0", then the measurement is | (f)^) = "^(1 ^) + 1 x®s)) 
for a random x. Next, a Hadamard operator is applied to the first register. In the case s = 0", the result is 

1 ^3) = I y) for ^ random y; in the case s ^ 0", the result is | (/ig) = | y) for a random y such that y • s = 0. 
Finally one measures the first register and obtains y. Repeating the experiment 0{n) times, one can solve 
for s by using Gaussian elimination and distinguish the case s = 0" from the case s 7^ 0". 



We state the Hoeffding Bound, a classical result in estimating tail probabilities. 

Lemma 1 (Hoeffding Bound 1191 ) Let k = {p — e)n, where e is a real number between and 1/2, and p 
is a real number between and 1. We have 



C statistical Distance 

We define the statistical distance and state some of its properties. The definitions and the results are standard. 
A good reference to the statistical distance is Vadhan's thesis i29ll . 



B The Hoeffding Bound 



j=0 




(5) 
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Definition 10 (Statistical Distance) The statistical distance between two probability distributions A and B, 
denoted as SD(yl, B), is defined to be 

SD(AS) = i^|^(a;)-i?(x)| (6) 

X 

where the summation is taken over the support of A and B. IfSD{A^ B) < e, we say A is e-close to B. 

This definition can be easily extended to the continuous case with the summation being replaced by 
integral and the distributions replaced by density functions. 

Lemma 2 Let T{x) be a probabilistic event with x as input Let A and B be two distributions. We have 

Pr [T{x)] - Pr [T{x)] < SD{A,B) (7) 

xeA xeB 



Lemma 3 (Sub-additivity) Let A\, A2, Bi, B2 be distributions, then we have 

SD{AiBi,A2B2)<SD{Ai,A2) + SD{Bi,B2) (8) 

where AB denotes the tensor product of the distributions A and B, i.e., AB{a, b) = A{a) • B{b). ■ 

Lemma 4 Let Di be a uniform distribution over an interval [a,a + I] and D2 a uniform distributions over 
[b, b + I]. Then SD{Di, D2) is at most \a — b\/l. 

Proof: Notice that both Di and D2 are uniform distributions of same length, and thus their density func- 
tions have value 1 / / over their supports and elsewhere. Consider the absolute difference between the two 
density functions, \Di{x) — D2{x)\. The size of its support is at most 2 1 a — 6|. Thus 50(1)1, 1)2) < \a — b\/l. 
■ 

D Proofs 

Lemma 5 Let Xn^p be the domain defined in and Cn,p the class of normalized booleanized linear 
functions over Xn,p- For any query function g : Xn,p 1— > {0,1}, there are at most p2n/3+2 pfgdif^Qfgs in Cn,p 
that are not 1 / p'^^^ -independent from g. 

For the proof we will need: 

Lemma 6 ([311) Let = {fi} be a set of function of range {—1, +1} and d be its cardinality. If{fi,fj) = 
Xfor all i 7^ j, then the set {fi} forms an orthonormal basis for the linear space spanned by Q, where 
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Proof of LemmalHl We first slightly modify the class £„ p so that its range becomes {—1, +1}- We define 
La(x) = 2 • La(x) — 1. It is not hard to see that each of the p"^^ normalized booleanized linear functions 
maps a 1/p fraction of the elements in Xn,p to +1, and a straightforward but tedious analysis (see [31J for 
a detailed account) shows that any two normalized booleanized linear functions agree at exactly (p^ — 2p + 
2^pn-2 _ p places in Xn,p- We define an inner product between functions over p 3.S 

p — p ^ 

With this inner product, any query function has norm 1 , and any pair of distinct functions La and Lb have 
the same inner product. This will allow us to "extract" an orthonormal basis from the class p using 
Lemma |6l 

Now we fix a query function g and relate predicates that are not .^-independent from g to the Fourier 
coefficients of g. Consider a booleanized linear function La, and we denote its positive set by S. We have 
that IS"! = p"^^ — 1. Suppose g maps a elements in Xn,p to +1, and b elements in S to +1. Then if La is 
not ^-independent from g, we have 

2a-f'+p 26-p"-i + l ^^^^ 



pn _ p pn 1 _ I 

or \a — bp\ > 2-^^. We write b = a/p + 6, and we have \6\ > ^" 2~^ C- 

Next we compute the inner product of g and La- Straightforward computation shows that 

'26- a + (p - l)(K-i - 1)^ 



(5,ia) = 2- 



pTl _ p 



p"- — p J \ p J p"- — p 
On the other hand, the inner product of g with an average over booleanized linear functions is 

^ b[o]=i ^ b[o]=i^ex„,p 



2a \ 2 



p"- — p J \ p 

Now we apply Lemma [S] setting d = p"-^^ and A = "^^prflp —■ We will obtain an orthonormal 

basis, which we denote by {Lb}. 

Putting things together, we can compute that Fourier coefficient of g over the component La. 

(fl-.^a) = -j2=={g,L^)- 
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( 1 



2a 



pTl _ p 



^l + {d-l)X 



+ 

2 

P 



AS 



pn _ p 
1 



2a 



pU _ p 



1 



^/l + {d-l)\ 

1 4(5 



P 



2a 



pfi _ p 



+ 



P 

1 



p 



p{n~\)l2 

26 



1 



2a 



p-n _ p 



+ 



'1-1/p 



n-1 



> 



V^(p"-i - 1) 
26 



1-4/p 
1 



^(pn-l _ 1) p{n-l)/2 

Now we substitute in ^ = and we have 

C 1 



1(5, ia) I > 



> 



^ p(n-l)/2 - pn/3+1 



(12) 



Thus 5 can have at most p2n/3+2 ^^^j^ Fourier coefficients, and so there can be at most p2n/3+2 predicates 
that are not 1 /p"/^ -independent from g. ■ 

Lemma 7 Let X„ = {0, 1}"\{0"} and Cn be the class of negative parity functions over Xn- For any query 
function g : {0, 1}" i— > { — 1, +1}, there are at most 2"/^+^ predicates in Cn that are not 2~^/^ -independent 
from g. 



Proof: We fix a negative parity function /. Let a denote the number of x € {0, 1}" such that g{x) = 1, and 
let h denote the number of x G S"/ such that g{x) = 1. Notice that since all parity functions are balanced, 
we have \ Sf\ 



2" ^ — 1 (since /(O") = 1 but 0" Sf). Then if / is not .^-independent from g, we have 
2h - 2"-i + 1 2a - 2 



or 



a -26 



in— 1 



1 



2n-l _ I 



2" 



:)n— 1 ^2"~i 



in— 1 



Next we perform Fourier analysis. We first define an inner product of real functions over {0, 1}": 



(13) 



(14) 



(15) 



xe{o,i}'^ 



We define a set of "modified parity functions" as ©s(x) = (— l)'*'^, which map elements in {0, 1}" to 
{ — 1, +1}. It is clear that the set of all parity functions {©s(x)}s form an orthonormal basis, and ©s(x) = 
1 — 2-1 ©s (x)- If a parity function ©^ (x) is not .^-independent from g, then (fT3t holds (by setting 
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/ = -'©s). Let t = (7(0"). Within the subset where ©s(x) = —1, which includes 0" and the positive set of 
-i©s, g maps b + t inputs to +1. Outside this subset, g maps a — b — t inputs to +1, and 2"~^ — a + b + t 
input to —1. Thus, we can compute the Fourier coefficient of g on ©g. 

(ffi„5) = 1-2- Pr [esix)=g{x)] 

'a-b-t I'^-^-a + b + t 

1 2*1 7ZZ H" 



2" 2" 
2a - 46 - 4t 



2" 

Substituting in (IT4l . we have 

Kes,5)l >e-6/2". (16) 

However, notice that the query function g{x) has norm 1 and thus it can have at most 1/(C — 6/2")^ Fourier 
coefficients such that ^ holds. Now plugging in ^ = 2"''/^, we have l/(^ - 6/2")^ < 2"/2+2, and the 
Lemma is proved. 
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